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Abstract —We study the problem of distributed adaptive esti¬ 
mation over networks where nodes cooperate to estimate physical 
parameters that can vary over both space and time domains. 
We use a set of basis functions to characterize the space- 
varying nature of the parameters and propose a diffusion least 
mean-squares (LMS) strategy to recover these parameters from 
successive time measurements. We analyze the stability and 
convergence of the proposed algorithm, and derive closed-form 
expressions to predict its learning behavior and steady-state 
performance in terms of mean-square error. We find that in 
the estimation of the space-varying parameters using distributed 
approaches, the covariance matrix of the regression data at 
each node becomes rank-deficient. Our analysis reveals that the 
proposed algorithm can overcome this difficulty to a large extent 
by benefiting from the network stochastic matrices that are used 
to combine exchanged information between nodes. We provide 
computer experiments to illustrate and support the theoretical 
findings. 

Index Terms —Diffusion adaptation, distributed processing, pa¬ 
rameter estimation, space-varying parameters, sensor networks, 
interpolation. 

I. INTRODUCTION 

I N previous studies on diffusion algorithms for adapta¬ 
tion over networks, including least-mean-squares (LMS) 
or recursive least squares (RLS) types, the parameters being 
estimated are often assumed to be space-invariant [l]-[6]. In 
other words, all agents are assumed to sense and measure 
data that arise from an underlying physical model that is 
represented by fixed parameters over the spatial domain. 
Some studies considered particular applications of diffusion 
strategies to data that arise from potentially different models 
[7], [8]. However, the proposed techniques in these works are 
not immediately applicable to scenarios where the estimation 
parameters vary over space across the network. This situation 
is encountered in many applications, including the monitoring 
of fluid flow in underground porous media [9], the tracking 
of population dispersal in ecology [10], the localization of 
distributed sources in dynamic systems [11], and the modeling 
of diffusion phenomena in inhomogeneous media [12]. In 
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these applications, the space-varying parameters being esti¬ 
mated usually result from discretization of the coefficients 
of an underlying partial differential equation through spatial 
sampling. 

The estimation of spatially-varying parameters has been 
addressed in several previous studies, including [13]-[17]. In 
these works and other similar references on the topic, the solu¬ 
tions typically rely on the use of a central processing (fusion) 
unit and less attention is paid to distributed and in-network 
processing solutions. Distributed algorithms are useful in large 
networks when there is no powerful fusion center and when 
the energy and communication resources of individual nodes 
are limited. Many different classes of distributed algorithms 
for parameter estimation over networks have been proposed 
in the recent literature, including incremental method [18]- 
[22], consensus methods [23]-[34], and diffusion methods [2], 
[3], [6], [35]-[37]. Incremental techniques require to set-up a 
cyclic path between nodes over the network and are there¬ 
fore sensitive to link failures. Consensus techniques require 
doubly-stochastic combination policies and can cause network 
instability in applications involving continuous adaptation and 
tracking [5]. In comparison, diffusion strategies demonstrate 
a stable behavior over networks regardless of the topology 
and endow networks with real-time adaptation and learning 
abilities [5], [6], [36]. 

Motivated by these considerations, in this paper, we develop 
a distributed LMS algorithm of the diffusion type to enable 
the estimation and tracking of parameters that may vary over 
both space and time. Our approach starts by introducing a 
linear regression model to characterize space-time varying 
phenomena over networks. This model is derived by discretiz¬ 
ing a representative second-order partial differential equation 
(PDE), which can be useful in characterizing many dynamic 
systems with spatially-varying properties. We then introduce 
a set of basis functions, e.g., shifted Chebyshev polynomials, 
to represent the space-varying parameters of the underlying 
phenomena in terms of a finite set of space-invariant expansion 
coefficients. Building on this representation, we develop a 
diffusion LMS strategy that cooperatively estimates, inter¬ 
polates, and tracks the model parameters over the network. 
We analyze the convergence and stability of the developed 
algorithm, and derive closed-form expressions to characterize 
the learning and convergence behavior of the nodes in mean- 
square-error sense. It turns out that in the context of space-time 
varying models, the covariance matrices of the regression data 
at the various nodes can become rank deficient. This property 
influences the learning behavior of the network and causes the 
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estimates to become biased. We elaborate on how the judicious 
use of stochastic combination matrices can help alleviate this 
difficulty. 

The paper is organized as follows. In Section II, we 
introduce a space-varying linear regression model which is 
motivated from a physical phenomenon characterized by a 
PDE, and formulate an optimization problem to find the 
unknown parameters of the introduced model. In Section 
III, we derive a diffusion LMS algorithm that solves this 
problem in a distributed and adaptive manner. We analyze the 
performance of the algorithm in Section IV, and present the 
numerical results of computer simulations in Section V. The 
concluding remarks appear in Section VI. 

Notation: Matrices are represented by upper-case and vec¬ 
tors by lower-case letters. Boldface fonts are reserved for 
random variables and normal fonts are used for deterministic 
quantities. Superscript denotes transposition for real¬ 
valued vectors and matrices and (•)* denotes conjugate trans¬ 
position for complex-valued vectors and matrices. The symbol 
E[-] is the expectation operator, Tr(-) represents the trace of 
its matrix argument and diag{-} extracts the diagonal entries 
of a matrix, or constructs a diagonal matrix from a vector. Im 
represents the identity matrix of size M x M (subscript M is 
omitted when the size can be understood from the context). 
The vec(-) operator vectorizes a matrix by stacking its columns 
on top of each other. A set of vectors are stacked into a column 
vector by col{-}. 


II. Modeling and Problem Formulation 


In this section, we motivate a linear regression model that 
can be used to describe dynamic systems with spatially varying 
properties. We derive the model from a representative second- 
order one-dimensional PDE that is used to characterize the 
evolution of the pressure distribution in inhomogeneous media 
and features a diffusion coefficient and an input source, both 
of which vary over space. Extension and generalization of the 
proposed approach, in modeling space-varying phenomena, to 
PDFs of higher order or defined over two-dimensional space 
are generally straightforward (see, e.g.. Section V-C). 

The PDE we consider is expressed as [12], [38]: 


df{x,t) 

dt 


A 

dx 



df{x,t) \ 
dx j 


+ q{x,t) 


( 1 ) 


where {x, t) € [0, L] x [0, T] denote the space and time 
variables with upper limits L € K+ and T S K+, respectively, 
f{x,t) : R, represents the system distribution (e.g., 

pressure or temperature) under study, 0{x): R —> R is the 
space-varying diffusion coefficient and ( 7 (a;,f):R^ —5- R is 
the input distribution that includes sources and sinks. PDE 
(1) is assumed to satisfy the Dirichlet boundary conditions', 
/(O, t) = f{L, t) = 0 for all t € [0, T], The distribution of the 
system at < = 0 is given by f(x, 0) = y{x) for x G [0, L], It 
is convenient to rewrite (1) as: 


df{x,t) 

dt 


d9{x) df{x,t) 
dx^ dx dx 


q{x,t) (2) 


'Generalization of the boundary conditions to nonzero values is possible 
as well. 


and employ the finite difference method (EDM) to discretize 
the PDE over the time and space domains [39]. For N and P 
given positive integers, let Ax = L/{N + 1) and Xk = kAx 
for k G {0,1, 2,..., 1}, and similarly, let At = T/P and 

ti = iAt for i G {0,1,2,..., P}. We further introduce the 
sampled values of the pressure distribution fk{i) — f{xk,ti), 
input qk{i) = q{xk,ti), and space-varying coefficient 9k = 
9{xk)- It can be verified that applying EDM to (2), yields the 
following recursion: 


fk{i) = Uk,ihl +Atqk{i - 1), k £ {1,2,..., N} (3) 
where the vectors G R'^^' and Uk,i G R'^^ are defined as 


Uk,i = - 1), fk{i - 1), fk+i{i - 1)] (5) 

the entries fc € R are: 

^i,k — ~ ^fe-i-i) (6) 

hlk = l-2iy9k (7) 

^3,fc ~ 4 


and iz = At/Ax^. Note that relation (3) is defined for 
k G {1,2,--- ,iV}, i.e., no data sampling is required to be 
taken at a; = {0,L} because fo{i) and /jv_|_i(i) respectively 
correspond to the known boundary conditions f{0,t) and 
f{L,t). For monitoring purposes (e.g., estimation of 9{x)), 
sensor nodes collect noisy measurement samples of f{x,t) 
across the network. We denote these scalar measurement 
samples by 

Zkii) = fkii) P nk{i) (9) 

where nk{i) G R is random noise term. Substituting (3) into 
(9) leads to 

dkii) = Uk,ihl + Ukii) (10) 


where 

dk{i) = Zk{i) - Atqk{i-1) (11) 


The space-dependent model (10) can be generalized to ac¬ 
commodate higher order PDF’s, or to describe systems with 
more than one spatial dimension. In the generalized form, we 
assume that Uk,i is random due to the possibility of sampling 
errors, and therefore represent it using boldface notation Uk,i. 
We also let h% and Uk,i be M-dimensional vectors. In addition, 
we denote the noise more generally by the symbol Vk{i) to 
account for different sources of errors, including the measure¬ 
ment noise shown in (9) and modeling errors. Considering 
this generalization, the space-varying regression model that 
we shall consider is of the form: 


dk{i) = Uk,ihl +Vk{i) (12) 

where dk{i) G R, 1x^4 G G R-^^' and Vk{i) G R. 

In this work, we study networks that monitor phenomena 
characterized by regression models of the form (12), where 
the objective is to estimate the space-varying parameter vectors 
hi for fcG{l,2,---,A^}. In particular, we seek a distributed 
solution in the form of an adaptive algorithm with a diffusion 
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mode of cooperation to enable the nodes to estimate and 
track these parameters over both space and time. The available 
information for estimation of the are the measurement 

samples, {dk{i), Wfc.i}, collected at the N spatial position Xk, 
which we take to represent N nodes. 

Several studies, e.g., [13]-[15], solved space-varying pa¬ 
rameter estimation problems using non-adaptive centralized 
techniques. In centralized optimization, the space-varying pa¬ 
rameters {h^} are found by minimizing the following global 
cost function over the variables {hk}'. 

N 

J(hi,...,/lAf) = ^ Jfc(/i/c) (13) 

where 

Jfe(/ife) = E|dfc(i) - (14) 

To find hi using distributed mechanisms, however, preliminary 
steps are required to transform the global cost (13) into a 
suitable form convenient for decentralized optimization [2]. 
Observe from (6)-(8) that collaborative processing is beneficial 
in this case because the hi of neighboring nodes are related 
to each other through the space-dependent function 9(x). 

Remark 1. Note that if nodes individually estimate their own 
space-varying parameters by minimizing Jk{hk), then at each 
time instant, they will need to transmit their estimates to 
a fusion center for interpolation, in order to determine the 
value of the model parameters over regions of space where no 
measurements were collected. Using the proposed distributed 
algorithm in Section III-B, it will be possible to update the 
estimates and interpolate the results in a fully distributed 
manner. Cooperation also helps the nodes refine their estimates 
and perform more accurate interpolation. ■ 

III. Adaptive Distributed Optimization 

In distributed optimization over networked systems, nodes 
achieve their common objective through collaboration. Such 
an objective may be defined as finding a global parameter 
vector that minimizes a given cost function that encompasses 
the entire set of nodes. For the problem stated in this study, 
the unknown parameters in (13) are node-dependent. However, 
as we explained in Section II, these space-varying parameters 
are related through a well-defined function, e.g., 6{x) over 
the spatial domain. In the continuous space domain, the 
entries of each hi, i.e., {/i° • • • , h^j ^.} can be interpreted 

as samples of M unknown space-varying parameter functions 
{h1{x),--- ,h°j^{x)} at location x = Xk, as illustrated in 
Fig. 1. 

We can now use the well-established theory of interpolation 
to find a set of linear expansion coefficients, common to all 
the nodes, in order to estimate space-varying parameters using 
distributed optimization. Specifically, we assume that the m-th 
unknown space-varying parameter function, h1^{x) can be 
expressed as a unique linear combination of some Ni, space 
basis functions, i.e., 

h1^{x) = Wjn,ibi{x)+Wm, 2 b 2 {x)^ -hWm.At&JVfc(a;) (15) 

where {Wm,n} are the unique expansion coefficients and 
{bn{x)} are the basis functions. In the application examples 



Fig. 1: An example of the space-varying parameter estimation 
problem over a one-dimensional network topology. The larger circles 
on the a:-axis represent the node locations at x = Xk- These nodes 
collect samples {dk{i),Uk,i} to estimate the space-varying parame¬ 
ters {h%}. For simplicity in defining the vectors bk in (20), for this 
example, we assume that the node positions Xk are uniformly spaced, 
however, generalization to non-uniform spacing is straightforward. 

treated in Section V, we adopt shifted Chebyshev polynomials 
as basis functions, which are generated using the following 
expressions [40] 

bi{x) = 1, b 2 {x) = 2x — 1 (16) 

bn+i{x) =2{2x-l)bn{x)-bn-i{x), 2 < n < Nf, (17) 

The choice of a suitable set of basis functions {bnix)}^hi 
is application-specific and guided by multiple considerations 
such as representation efficiency, low computational complex¬ 
ity, interpolation capability, and other desirable properties, 
such as orthogonality. Chebyshev basis functions yield good 
results in terms of the above criteria and helps avoid the 
Runge’s phenomenon at the endpoints of the space interval 
[40]. 

The sampled version of the m-th space-varying parameter 
h1^{x) in (15), at a: = Xfe = kAx, can be written as: 

h‘L,k = W^bk (18) 

where 

Wm = [W^.l, Wm,2, • • • , Wm,N,f (19) 

bk = [bi,k, • ■ • , ( 20 ) 

and each entry bn,k is obtained by sampling the corresponding 
basis function at the same location, i.e., 

bn,k = bn{Xk) = bn{kAx) (21) 

Collecting the sampled version of all M functions h1^{x) for 
m C {1, • • ■ , M} into a column vector as 

K = [hlk.hlk.--- ( 22 ) 

and using (18), we arrive at: 

K = W%k (23) 

where 



wr ,2 ■ 


W211 

W2I2 ■ 

■ ■ ^2,N, 


• 

■■ Wm,n, 
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Remark 2. Several other interpolation techniques can be used 
to obtain the basis functions bn{x), such as the so-called 
kriging method [41]. The latter is a data-based weighting 
approach, rather than a distance-based interpolation. It is 
applicable in scenarios where the unknown random field to be 
interpolated, in our case is wide-sense stationary; accord¬ 
ingly, it requires knowledge about the means and covariances 
of the random field over space, as employed in [42]. If these 
covariances are not available, then the variogram models, 
describing the degree of spatial dependence of the random 
field, are used to generate substitutes for these covariances 
[43]. However, a-priori knowledge about the parameters of 
variogram models, including nugget, sill, and range, are re¬ 
quired to obtain the spatial covariances. In this work, since 
neither the means and covariances nor the parameters of the 
variogram models of the random fields are available, we focus 
on interpolation techniques that rely on distance information 
rather than the statistics of the random field to be interpolated. 
■ 

Returning to equation (23), it is convenient to rearrange 
W° into an MNt, X 1 column vector w° by stacking up the 
columns of W°^, i.e., w° = vec(kk°^), and defining the 
block diagonal matrix Bk € as: 

Bk=lM® bl (25) 

Then, relation (23) can be rewritten in terms of the unique 
parameter vector w° as: 

hi = BkW° (26) 

so that substituting hi from (26) into (12) yields: 

dk{i) = Uk,iBkW° -I- Vk{i) (27) 

Subsequently, the global cost function (13) becomes: 

N 

- Uk^iBkw\‘^ (28) 

k^l 

In the following, we elaborate on how the parameter vector 
w° and, hence, the {hi} can be estimated from the data 
{dk{i),Uk,i} using centralized and distributed adaptive op¬ 
timization. 

A. Centralized Adaptive Solution 

We begin by stating the assumed statistical conditions on 
the data over the network. 

Assumption 1. We assume that {dk{i),Uk,i,Vk{i)} in model 
(27) satisfy the following conditions: 

1) dk{i) and Uk^i are zero-mean, jointly wide-sense sta¬ 
tionary random processes with second-order moments: 

rdu,k = '^dk(i)ull\ € (29) 

Ru,k = e (30) 

2) The regression data {ttfc i} are i.i.d. over time, indepen¬ 
dent over space, and their covariance matrices, Ru,k, are 
positive definite for all k. 

3) The noise processes {rtfc(i)} are zero-mean, i.i.d. over 
time, and independent over space with variances {a^ k}- 


4) The noise process Vk{i) is independent of the regression 
data Um,j for all i,j and k,m. ■ 

The optimal parameter w° that minimizes (28) can be found 
by setting the gradient vector of J (w) to zero. This yields the 
following normal equations: 

N N 

= y]w (31) 

fc=l k=l 

where {Ru,k,rdu,k} denote the second-order moments of 
UkdBk and dfc(i): 

Ru,k = Bk Ru,kBk, rdu,k = Bkrdu,k (32) 

It is clear from (31) that when Ru,k > 0, then w° can 

be determined uniquely. If, on the other hand, Bu,k is 

singular, then we can use its pseudo-inverse to recover the 
minimum-norm solution of (31). Once the global solution is 
estimated, we can retrieve the space-varying parameter vectors 
hi by substituting w° into (26). 

Alternatively the solution w° of (31) can be sought itera¬ 
tively by using the following steepest descent recursion: 

N 

M X! (33) 

k^l 

(c) 

where /r > 0 is a step-size parameter and tu] ' is the estimate 
of w° at the i-th iteration. Recursion (33) requires the central¬ 
ized processor to have knowledge of the covariance matrices, 
Ru,k, and cross covariance vectors, fdu,k, across all nodes. 
In practice, these moments are unknown in advance, and we 
therefore use their instantaneous approximations in (33). This 
substitution leads to the centralized LMS strategy (34)-(35) 
for space-varying parameter estimation over networks. 

Algorithm 1 : Centralized LMS 

N 

wf'’ = + fi'^B'[uli{dk{i) - Uk,,Bkw‘'^\) (34) 

k^l 

hk,i = Bkwf\ ,7V} (35) 


In this algorithm, at any given time instant i, each node 
transmits its data {itfc.i, dk{i)} to the central processing unit to 

(c) 

update wi_^. Subsequently, the algorithm obtains an estimate 
for the space-varying parameters, hk^, by using the updated 
estimate to] , and the basis function matrix at location k, 
(i.e., Bk). This latter step can also be used as an interpolation 
mechanism to estimate the space-varying parameters at loca¬ 
tions other than the pre-determined locations {xk}, by using 
the corresponding matrix B{x) for some desired location x. 

B. Adaptive Diffusion Strategy 

There are different distributed optimization techniques that 
can be applied to (28) in order to estimate w° and consequently 
obtain the optimal space-varying parameters hi. Let A4 de¬ 
note the index set of the neighbors of node k, i.e., the nodes 
with which node k can share information (including k itself). 
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One possible optimization strategy is to decouple the global 
cost (28) and write it as a set of constrained optimization 
problems with local variables Wk, [44], i.e., 

min - ug ^BkiVkl"^ 

Wk ^' 

e&Mk 

subject to Wk = w (36) 

where q ^ are nonnegative entries of a right-stochastic matrix 
C G satisfying: 

ce^k = 0 if f ^ A4 and Cl = 1 (37) 

and 1 is the column vector with unit entries. 

The optimization problem (36) can be solved using, for 
example, the alternating directions method of multipliers 
(ADMM) [44], [45]. In the algorithm derived using this 
method, the Lagrangian multipliers associated with the con¬ 
straints need to be updated at every iteration during the opti¬ 
mization process. To this end, information about the network 
topology is required to establish a hierarchical communication 
structure between nodes. In addition, the constraints imposed 
by (36) require all agents to agree on an exact solution; this 
requirement degrades the learning and tracking abilities of the 
nodes over the network. When some nodes observe relevant 
data, it is advantageous for them to be able to respond quickly 
to the data without being critically constrained by perfect 
agreement at that stage. Doing so, would enable information 
to diffuse more rapidly across the network. 

A technique that does not suffer from these difficulties and 
endows networks with adaptation and learning abilities in real¬ 
time is the diffusion strategy [2], [3], [6], [35], [36]. In this 
technique, minimizing the global cost (28) motivates solving 
the following unconstrained local optimization problem for 
,iV} [2]: 

min ( Cfk^de{i) - ueiBkw\^ 

W \ ^ 

eeMk 

+ X! W.fclk - (38) 

iGAfkXik} 

where ipi is the available estimate of the global parameter at 
node £, JVk\{k} denotes set A4 excluding node k, and {pi^k} 
are nonnegative scaling parameters. Following the arguments 
in [2], [3], [6], the minimization of (38) leads to a general 
form of the diffusion strategy described by (39)-(42), which 
can be specialized to several simpler yet useful forms. 


Algorithm 2 : Diffusion LMS 


(39) 

i&Mk 



- ue^tB(4>k^i_i) 

i&Mk 

(40) 

Wk,k = 4,4e,^ 

(41) 




(42) 


In this algorithm, > 0 is the step-size at node k, 
{wk,i,ipk iJ i-i} intermediate estimates of w°, hk^i is 
an intermediate estimate of /i^, and {a'pki O'^Pk} nonneg¬ 
ative entries of left-stochastic matrices Ai, A 2 G that 

satisfy: 

41 = 41 = 0 if ^ ^ (43) 

Af 1 = 1 A^l = 1 (44) 

Each node k in the first combination step fuses 
in a convex manner to generate 4>k i-i- fn the following step, 
named adaptation, each node k uses its own data and that 
of neighboring nodes, i.e., to adaptively 

update 4>ki-i to an intermediate estimate '4>ki- fti the third 
step, which is also a combination, the intermediate estimates 
{'4’i i}^GA4 ^6 fused to further align the global parameter 
estimate at node k to that of its neighbors. Subsequently, the 
desired space-varying parameter hk,i is obtained from Wk,i. 
Note that each step in the algorithm runs concurrently over 
the network. 

Remark 3. The main difference between Algorithm 2 and the 
previously developed diffusion LMS strategies in, e.g., [2], [6], 
[35] is in the transformed domain regression data U(^iB( in 
(40) which now have singular covariance matrices. Moreover, 
there is an additional interpolation step (42). ■ 

Remark 4. The proposed diffusion LMS algorithm estimates 
NM spatially dependent variables using Ni,M global 

invariant coefficients in w°. From the computational com¬ 
plexity and energy efficiency point of view, it seems this 
is advantageous when the number of nodes, N, is greater 
than the number of basis functions Ni,. However, even if this 
is not the case, using the estimated Ni,M global invariant 
coefficients, the algorithm not only can estimate the space- 
varying parameters at the locations of the N nodes, but 
can also estimate the space-varying parameters at locations 
where no measurements are available. Therefore, even when 
N < Nk, the algorithm is still useful as it can perform 
interpolation. ■ 

There are different choices for the combination matrices 
{Ai,A 2 ,C'}. For example, the choice Ai = A 2 = C = / 
reduces the above diffusion algorithm to the non-cooperative 
case where each node runs an individual LMS filter without 
coordination with its neighbors. Selecting C = I simplifies the 
adaptation step (40) to the case where node k uses only its 
own data {dk{i),Uk,i} to perform local adaptation. Choosing 
Ai = / and A 2 = A, for some left-stochastic matrix A, 
removes the first combination step and the algorithm reduces 
to an adaptation step followed by combination (this variant of 
the algorithm has the Adapt-then-Combine or ATC diffusion 
structure) [2], [6]. Likewise, choosing Ai = A and A 2 = / re¬ 
moves the second combination step and the algorithm reduces 
to a combination step followed by adaptation (this variant 
has the Combine-then-Adapt (CTA) structure of diffusion [2], 
[6]). Often in practice, either the ATC or CTA version of the 
algorithm is used with C set to 67 = / such as using the 
following ATC diffusion version described by equations (45)- 
(47). 
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Algorithm 3 : Diffusion ATC 

'^k,i = Wk,i-i + k'kBlul^^{dk{i) - Uk,iBkWk^t-i) 

(45) 

Wk,z = ^ ai,k4>i,i 

(46) 




(47) 


Nevertheless for generality, we shall study the performance 
of Algorithm 2 for arbitrary matrices {Ai,A 2 ,C} with C 
right-stochastic and {^ 1 ,^ 2 } left-stochastic. The results can 
then be specialized to various situations of interest, including 
ATC, CTA, and the non-cooperative case. 

The combination matrices {Ai,A 2 ,C} are normally ob¬ 
tained using some well-known available combination rules 
such as the Metropolis or uniform combination rules [2], [24], 
[35]. These matrices can also be treated as free variables 
in the optimization procedure and used to further enhance 
the performance of the diffusion strategies. Depending on 
the network topology and the quality of the communication 
links between nodes, the optimized values of the combination 
matrices differ from one case to another [6], [46]-[48]. 


IV. Performance Analysis 

In this section, we analyze the performance of the diffusion 
strategy (39)-(42) in the mean and mean-square sense and 
derive expressions to characterize the network mean-square 
deviation (MSD) and excess mean-square error (EMSE). In 
the analysis, we need to consider the fact that the covariance 
matrices {Ru,k}^=i defined in (32) are now rank-deficient 
since we have Nb > 1. We explain in the sequel the 
ramifications that follow from this rank-deficiency. 


A. Mean Convergence 

We introduce the local weight-error vectors 


Wk,i = w°-Wk,i, 4>k,i - 

and define the network error vectors: 

-<t>kd (48) 

=COl{0ii,...,0^ J 

(49) 

=col{^ii,...,^^ J 

(50) 

Wi = coljuti^i, . . . , WN,i} 

(51) 


We collect the estimates from across the network into the block 
vector: 


lUj = col{u;i^,,... (52) 

and introduce the following extended combination matrices: 


Ai= Ai® iMNi 

(53) 

A2 = A 2 ® ItANk 

(54) 

iMNk 

(55) 


We further define the block diagonal matrices and vectors: 
TZi = diagj ^ ce^kBj : A; = 1, • ■ • , (56) 


Ad = diaglpi/MAb, • ■ • (57) 

ti = col| ^ a^kBj uJid({i) : k = !,■■■ , (58) 

t&Mk 

g, = C^col{BfMf ,t!i(i), • • • (59) 

and introduce the expected values of TZi and ti. 

7^ ^ E[7^,] = diag{ i?i, • • • , } (60) 

r = E[fi] = coljri, • • • ,rA} ( 61 ) 

where 

Rk = ^ cr,fe Ru.i (62) 

leMk 

rfc = ^ cr,fe fdu,i (63) 

teMk 


We also introduce an indicator matrix operator, denoted by 
Ind(-), such that for any real-valued matrix X with (fc, j)-th 
entry X^j, the corresponding entry of F = Ind(2f) is: 


Yk, 


1, if Xkj > 0 

0, otherwise 


Now from (39)-(41), we obtain: 


where 


= BiWi-i + MU 

B^ = Al{I-M'JZ^)Al 


(64) 


(65) 

( 66 ) 


In turn, making use of (27) in (65), we can verify that the 
network error vector follows the recursion 


Wi = BiW,_i - AjMOi 


(67) 


By taking the expectation of both sides of (67) and using 
Assumption 1, we arrive at: 

E[ti;,] =SE[*,_i] (68) 

where in this relation: 


B A E[B^] =Al{I- M'R)A{ (69) 

To obtain (68), we used the fact that the expectation of the 
second term in (67), i.e., E[Al|’Afg'j], is zero because Vk(i) 
is independent of u^.i and E[r;fc(i)] = 0. The rank-deficient 
matrices {Ru,k} appear inside TZ in (69). We now verify that 
despite having rank-deficient matrix TZ, recursion (68) still 
guarantees a bounded mean error vector in steady-state. 


To proceed, we introduce the eigendecomposition: 

Rk = QkAkQl (70) 

where Qk = [qk,i, • ■ • ,qk,MNk] is a unitary matrix with col¬ 
umn eigenvectors qf^ j and = diag{Afe(l), • • • , Afc(MA),)} 
is a diagonal matrix with eigenvalues Xk{j) > 0. Eor this 
decomposition, we assume that the eigenvalues of R^ are 
arranged in descending order, i.e, Xmax{Rk) = Afe(l) > 
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Afc(2) > ••• > Xk{MNi,), and the rank of Rk is Lk < 
MNb. If we define Q = diagjQi,..., and A = 
diagjAi,--- jA^v}, then the network covariance matrix, TZ, 
given by (60) can be expressed as: 

TZ = QKQT (71) 


We now note that the mean estimate vector, ]E['U)i], expressed 
by (68) will be asymptotically unbiased if the spectral radius 
of 6, denoted by p{B), is strictly less than one. Let us examine 
under what conditions this requirement is satisfied. Since Ai 
and A 2 are left-stochastic matrices and TZ is block-diagonal, 
we have from [6] that: 

p{B) = p[aI (/ - MTZ)Al ) <p{l- MTZ) (72) 

Therefore, if TZ is positive-definite, then choosing 
Pk < 2/Amax(7?fc) ensures convergence of the algorithm 
in the mean so that E[i3;i] —s- 0 as i —)■ cx). However, when TZ 
is singular, it may hold that p{B) = 1, in which case choosing 
the step-sizes according to the above bound guarantees the 
boundedness of the mean error, but not necessarily 

that it converges to zero. The following result clarifies these 
observations. 


Theorem 1. If the step-sizes are chosen to satisfy 

2 


0 < Mfc < 


Xma-x{Rk^ 


(73) 


then, under Assumption 1, the diffusion algorithm is stable in 
the mean in the following sense: (a) If p{B) < 1, then E[i(;i] 
converges to zero and (b) if p{B) = 1 then 


lim E[rh,] < ||/-Ind(A)||6,oo E[th_i] (74) 


where || • ||b 00 stands for the block-maximum norm, as defined 
in [6], [47]’. 

Proof: See Appendix A. ■ 

In what follows, we examine recursion (65) and derive an 
expression for the asymptotic value of E[t(;i]—see (89) further 
ahead. Before doing so, we first comment on a special case 
of interest, namely, result (76) below. 

Special case: Consider a network with Ai = A 2 = I and 
an arbitrary right stochastic matrix C satisfying (37). Using 
(27) and (62)-(63), it can be verified that the following linear 
system of equations holds at each node k: 


RkW° = Tk 


(75) 


We show in Appendix B that under condition (73) the mean 
estimate of the diffusion LMS algorithm at each node k will 
converge to: 

MNh 

lim -f qk,nqk n^b^k-i] (76) 

i—^oo ^ ’ 

n=Lfc+l 

where Rj. represents the pseudo-inverse of Rk, and Wk,-i is 
the node initial value. This result is consistent with the mean 
estimate of the stand-alone LMS filter with rank-deficient input 
data (which corresponds to the situation Ai = A 2 = C = I) 
[49]. Note that in (76) corresponds to the minimum- 

norm solution of RkW = rk- Therefore, the second term on 


the right hand side of (76) is the deviation of the node estimate 
from this minimum-norm solution. The presence of this term 
after convergence is due to the zero eigenvalues of Rk- If 
Rk were full-rank so that Lk = MNb, then this term would 
disappear and the node estimate will converge, in the mean, 
to its optimal value, w°- We point out that even though the 
matrices Ru^i are rank deficient since Nt, > 1, it is still 
possible for the matrices Rk to be full rank owing to the 
linear combination operation in (62). This illustrates one of the 
benefits of employing the right-stochastic matrix C- However, 
if despite using C, Rk still remains rank-deficient, the second 
term on the right-hand side of (76) can be annihilated by 
proper node initialization (e.g., by setting E[i(;fc _i] = 0). By 
doing so, the mean estimate of each node will then approach 
the unique minimum-norm solution, R\rk- 

General case: Let us now find the mean estimate of the 
network for arbitrary left-stochastic matrices Ai and A 2 - 
Considering definitions (60)-(61) and relation (75) and noting 
that (1(g) w®) = A^ (1(8) 17)°) = (1(8) w°), it can be verified 
that (l(8)tc°) satisfies the following linear system of equations: 

{I-B){l®w°) = Al-Mr (77) 

This is a useful intermediate result that will be applied in our 
argument. 

Next, if we iterate recursion (65) and apply the expectation 
operator, we then obtain 

i 

E[t(;i] = i3*+^E[t(;_i] + ^ B^A^ Mr (78) 

7=0 

The mean estimate of the network can be found by computing 
the limit of this expression for i > cx). To find the limit 
of the first term on the right hand side of (78), we evaluate 
limi_ 5 .oo S* and find conditions under which it converges. For 
this purpose, we introduce the Jordan decomposition of matrix 
B as [50]: 

B = ZTZ-^ (79) 

where Z is an invertible matrix, and T is a block diagonal 
matrix of the form 

r = diag{ri,r2,---,r,} (80) 

where the Lth Jordan block, L; € expressed 

as: 

F; = 'yilmi + Nm, (81) 


In this relation, Nmi is some nilpotent matrix of size mi xmi- 
Using decomposition (79), we can express as 


= zrz-^ 

(82) 

Since F is block diagonal, we have 


r* = diag{ri,r*,.-- ,r:} 

(83) 


From this relation, it is deduced that limi_s.oo exists if 
limi_ 5 .oo F] exists for all / € {1, • • • , s}. Using (81), we can 
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write [50]: 


lim ri = lim 7;-"*' (iPlm, + E' 

•,—¥00 t—¥CCi \ ' ^ \^p J ' 


When i ^ 00 , 7 *“™' becomes the dominant factor in this 
expression. Note that under condition (73), we have p(B) < 1 
which in turn implies that the magnitude of the eigenvalues 
of B are bounded as 0 < | 7 „| < 1. Without loss of 
generality, we assume that the eigenvalues of B are arranged 
as | 7 i| < ••• < | 7 i| < | 7 i,+i| = •■• = | 7 ^| = 1. Now we 
examine the limit (84) for every | 7 ;| in this range. Clearly for 
| 7 ;| < 1 , the limit is zero (an obvious conclusion since in this 
case F; is a stable matrix). For | 7 i| = 1, the limit is the identity 
matrix if 7 ; = 1 and mi = 1. However, the limit does not exist 
for unit magnitude complex eigenvalues and eigenvalues with 
value -1, even when mi = 1. Motivated by these observations, 
we introduce the following definition. 

Definition: We refer to matrix B as power convergent if (a) 
its eigenvalues 7 „ satisfy 0 < | 7 „| < 1, (b) its unit magnitude 
eigenvalues are all equal to one, and (c) its Jordan blocks 
associated with 7 „ = 1 are all of size 1 x 1 . ■ 

Example 1: Assume Nb = 1, Bk = Im, and uniform step- 
sizes and covariance matrices across the agents, i.e., pk = p, 
Ru.k = Ru for all k. Assume further that C is doubly- 
stochastic (i.e., C^l = 1 = Cl) and is singular. Then, 
in this case, the matrix B can be written as the Kronecker 
product B = AjAj 0 {Im — pRu)- For strongly-connected 
networks where A 1 A 2 is a primitive matrix, it follows from 
the Perron-Frobenius Theorem [51] that A 1 A 2 has a single 
unit-magnitude eigenvalue at one, while all other eigenvalues 
have magnitude less than one. We conclude in this case, from 
the properties of Kronecker products and under condition 
(73), that B is a power-convergent matrix. ■ 


Example 2 : Assume M = 2, N = 3, Ni, = 1, Bk = Im, and 
uniform step-sizes and covariance matrices across the agents 
again. Let A 2 = I = C and select 

■ 1/2 0 o' 

Ai = A= 1/2 0 1 (85) 

0 1 0 

which is not primitive. Let further = diag{/3,0} denote 
a singular covariance matrix. Then, it can be verified in this 
case the corresponding matrix B will have an eigenvalue with 
value —1 and is not power convergent. ■ 

Returning to the above definition and assuming B is power 
convergent, then this means that the Jordan decomposition (79) 
can be rewritten as: 



r 2-1 


where J is a Jordan matrix with all eigenvalues strictly inside 
the unit circle, and the identity matrix inside F accounts for 
the eigenvalues with value one. In ( 86 ) we further partition Z 


and Z ^ in accordance with the size of J. Using ( 86 ), it is 
straightforward to verify that 

lim = Z 2 Z 2 (87) 

i—^oo 

and if we multiply both sides of (77) from the left by Z 2 , it 
also follows that 

Z2A2Mr = 0 ( 88 ) 

Using these relations, we can now establish the following 
result, which describes the limiting behavior of the weight 
vector estimate. 

Theorem 2. If the step-sizes {/ii,--- satisfy (73) and 

matrix B is power convergent, then the mean estimate of the 
network given by (78) asymptotically converges to: 

limEK] = {Z 2 Z 2 )E[w_i] + [I - BYAlMr (89) 

i—¥oo 

where the notation X~ denotes a (reflexive) generalized 
inverse for the matrix X. In this case, the generalized inverse 
for I — B is given by 

{I-B)-= Zi{I-J)-^Zi (90) 

which is in terms of the factors {Zi, Zi, J} defined in ( 86 ). 

Proof: See Appendix C. ■ 

We also argue in Appendix C that the quantity on the right- 
hand side of (89) is invariant under basis transformations for 
the Jordan factors {Zi, Zi, Z 2 , .Z 2 }. It can be verified that if 
Ai = A 2 = I then B will be symmetric and the result (89) 
will reduce to (76). Now note that the first term on the right 
hand side of (89) is due to the zero eigenvalues of I — B. 
From this expression, we observe that different initialization 
values generally lead to different estimates. However, if we 
set E[t(;_i] = 0 , the algorithm converges to: 

lim EK] = (7 - B)-A^Mr (91) 

i—^oo 

In other words, the diffusion LMS algorithm will converge on 
average to a generalized inverse solution of the linear system 
of equations defined by (77). 

When matrix B is stable so that p{B) < 1 then the 
factorization ( 86 ) reduces to the form B = ZiJZi and I ~ B 
will be full-rank. In that case, the first term on the right hand 
side of (89) will be zero and the generalized inverse will 
coincide with the actual matrix inverse so that (89) becomes 

\imE[wi] = {I-B)-^AlMr (92) 

i—¥co 

Comparing (92) with (77), we conclude that: 

lim E[tUi] = 1 (g) (93) 

which implies that the mean estimate of each node will be 
w°. This result is in agreement with the previously developed 
mean-convergence analysis of diffusion LMS when the regres¬ 
sion data have full rank covariance matrices [ 6 ]. 

B. Mean-Square Error Convergence 

We now examine the mean-square stability of the error 
recursion (67) in the rank-deficient scenario. We begin by 
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deriving an error variance relation as in [52], [53]. To find 
this relation, we form the weighted square “norm” of (67), 
and compute its expectation to obtain: 

=¥.{\\w,_4l,) + ¥.[gj MA2^AlMgi] (94) 

where ||x|||; = x^YjX and E > 0 is an arbitrary weighting 
matrix of compatible dimension that we are free to choose. In 
this expression, 

T,'= Ai{I-Mni)'^A2T.Al{I-Mni)Al (95) 

Under the temporal and spatial independence conditions on 
the regression data from Assumption 1, we can write: 

E(||*,_i|||,) =E||*.-i||^[i:'] (96) 

so that (94) becomes: 

E||th,||| =E||th,_i|||, +Tr[Eyf^MeAf^2] (97) 

where Q = Elg^^gf] is given by 

g = C'^dmg{al;^Ru,i, ■■■, aljfRu,N}C (98) 

and 

E' ^ E[S'] = + O(M^) « B^SB (99) 

We shall employ (99) under the assumption of sufficiently 
small step-sizes where terms that depend on higher-order 
powers of the step-sizes are ignored. We next introduce 

y = AjMgMA2 ( 100 ) 

and use (97) to write: 

Ell*,III = E||*,_i|||, + Tr(E3^) (101) 

From (101), we arrive at 

i 

E||*i||| =E||*_i||^BT)i+iEBi+i +^Tr((B^)^EB^T) (102) 

3=0 

To prove the convergence and stability of the algorithm in the 
mean-square sense, we examine the convergence of the terms 
on the right hand side of (102). 

In a manner similar to (88), it is shown in Appendix D that 
the following property holds: 

Z23^ = 0, yz^ = 0 (103) 

Exploiting this result, we can arrive at the following statement, 
which establishes that relation (102) converges as i ^ oo and 
determines its limiting value. 

Theorem 3. Assume the step-sizes are sufficiently small and 
satisfy (73). Assume also that B is power convergent. Under 
these conditions, relation (102) converges to 

^lim E||*,||| = 

+ (vec(3^))^(/ - J')-ivec(E) (104) 

where 

.F= ((Zi®Zi)(J® J)(Zi®Zi))^ (105) 

and factors {Zi, Zi, J} are defined in (86). 


Proof: See Appendix D. ■ 

In a manner similar to the proof at the end of Appendix C, 
the term on the right hand side of (104) is invariant under basis 
transformations on the factors {Zi, Zi, Z 2 , Z 2 }. Note that the 
first term on the right hand side of (104) is the network penalty 
due to rank-deficiency. When the node covariance matrices are 
full rank, then choosing step-sizes according to (73) leads to 
p{B) < 1. When this holds, then B = Zi JZi. In this case, 
the first term on the right hand side of (104) will be zero, and 
F = {B ® B)'^. In this case, we obtain: 

lim E||*i||| = (vec([y))^(/— J^)~^vec(E) (106) 

i—foo 

which is in agreement with the mean-square analysis of 
diffusion LMS strategies for regression data with full rank 
covariance matrices given in [2], [6]. 

C. Learning Curves 

For each k, the MSD and EMSE measures are defined as: 

r]k = lim EWhk^iW^ = lim E||*fc,i|||TB, (107) 

Cfc = lim E||Mfe,,/rfe,,_i||^ = lim E||*fe,,_i ||| (108) 

where hk,i = — hk^i- These parameters can be computed 

from the network error vector (104) through proper selection 
of the weighting matrix E as follows: 

Vk = lim E||*,||| Cfe = lim E||*,_i||| (109) 

1^00 Z—^00 emse^. 

where 

^msdfe — diag(efc) (g) (BlBk), Eemse^ = diag(efc) g) Ru,k 

( 110 ) 

and {ek}^^i denote the vectors of a canonical basis set in N 
dimensional space. The network MSD and EMSE measures 
are defined as 

1 AT ^ N 

Vnet — ^ ^ Cnet ^ ^ ^ Cfc (111) 

k=l k=l 

We can also define MSD and EMSE measures over time as 

77fc(i) = E||/ifc,i||2 ^E||*,|||_^^^^ (112) 

Cfc(f) = E||Mfc,,/rfc,,_if = E||*i_i|||^_^^^^ (113) 

Using (102), it can be verified that these measures evolve 
according to the following dynamics: 

Vk{i) = r]k{i - 1) - lk°llw«(/-'H)<T„3d, + 

(114) 

Cfc(*) = Cfe(* - 1) - lk°llw*(/-W)ae„.e, + a'^^Vemse, 

(115) 

where 

n = (116) 

a = vec(3^) (117) 

f^msclfe — V6c(X]nisd^) (118) 

^emsefe — Vec(X]emsefe) (119) 
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To obtain (114) and (115), we set = 0 for all k. 

V. Computer Experiments 

In this section, we examine the performance of the diffusion 
strategy (39)-(42) and compare the simulation results with 
the analytical hndings. In addition, we present a simulation 
example that shows the application of the proposed algorithm 
in the estimation of space-varying parameters for a physical 
phenomenon modeled by a PDE system over two spatial 
dimensions. 

A. Performance of the Distributed Solution 

We consider a one-dimensional network topology, illustrated 
by Eig. 1, with L = 1 and equally spaced nodes along 
the X direction. We choose Ai as the identity matrix, and 
compute A 2 and C based on the uniform combination and 
Metropolis rules [2], [6], respectively. We choose M = 2 
and Nb = 5 and generate the unknown global parameter 
w° randomly for each experiment. We obtain using the 
shifted Chebyshev polynomials given by (17) and compute 
the space varying parameters according to (26). The 
measurement data dk{i), k G {1,2,--- ,iV} are generated 
using the regression model (12). The SNR for each node k 
is computed as SNR^ = 1 ^. The noise and the 

entries of the regression data are white Gaussian and satisfy 
Assumption 1. The noise variances, {cr^^.}, and the trace of 
the covariance matrices, {Tr(i?„ fe)}, are uniformly distributed 
between [0.05,0.1] and [1,5], respectively. 

Eigure 2 illustrates the simulation results for a network with 
TV = 4 nodes. Eor this experiment, we set = 0.01 for all 
k and initialize each node at zero. In the legend of the figure, 
we use the subscript h to denote the MSD for hk,i and the 
subscript w to refer to the MSD of Wk^i- The simulation curves 
are obtained by averaging over 300 independent runs, it can 
be seen that the simulated and theoretical results match well 
in all cases. To obtain the analytical results, we use expression 
(104) to assess the steady-state values and expression (114) to 
generate the theoretical learning curves. 

Two important points in Fig. 2 need to be highlighted. First, 
note from the top plot that the network MSD for Wk,i is larger 
than that for hk,i. This is because 

SO that the MSD of hk,i is a weighted version of the MSD 
of Wk,i. In this experiment, the weighting leads to a lower 
estimation error. Second, note from the bottom plot that 
while the MSD values of Wk,i are largely independent of 
the node index, the same is not true for the MSD values of 
hk^i- In previous studies on diffusion LMS strategies, it has 
been shown that, for strongly-connected networks, the network 
nodes approach a uniform MSD performance level [36]. The 
result in Fig. 2(b) supports this conclusion where it is seen 
that the MSD of Wk^i for nodes 2 and 4 converge to the same 
MSD level. However, note that the MSD of hk^i is different 
for nodes 2 and 4. This difference in behavior is due to the 
difference in weighting across nodes from (120). 



(a) The network MSD. 
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(b) The MSD at some individual nodes. 

Fig. 2: The network MSD learning curve for N = A. 


B. Comparison with Centralized Solution 

We next compare the performance of the diffusion strategy 
(39)-(42) with the centralized solution (34)-(35). We consider 
a network with V = 10 nodes with the topology illustrated 
by Fig. 1. In this experiment, we set pk = 0.02 for all k, 
while the other network parameters are obtained following 
the same construction described for Fig. 2. As the results in 
Fig. 3 indicate, the diffusion and centralized LMS solutions 
tend to the same MSD performance level in the w domain. 
This conclusion is consistent with prior studies on the per¬ 
formance of diffusion strategies in the full-rank case over 
strongly-connected networks [36]. However, discrepancies in 
performance are seen between the distributed and centralized 
implementations in the h domain, and the discrepancy tends 
to become larger for larger values of N. This is because, in 
moving from the w domain to the h domain, the inherent 
aggregation of information that is performed by the centralized 
solution leads to enhanced estimates for the variables. For 
example, if the estimates Wk^i which are generated by the 
distributed solution are averaged prior to computing the hk^i, 
then it can be observed that the MSD values of hk^i for both 
the centralized and the distributed solution will be similar. 

In these experiments, we also observe that if we increase 
the number of basis functions. A),, then both the centralized 
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(b) Nk = 10. 

Fig. 3: The network MSD learning curve for N = 10. 


and diffusion algorithms will converge faster but their steady- 
state MSD performance will degrade. Therefore, in choosing 
the number of basis functions, Nb, there is a trade off between 
convergence speed and MSD performance. 


C. Example: Two-Dimensional Process Estimation 


In this example, we consider a two-dimensional network 
with 13 X 13 nodes that are equally spaced over the unit square 
(x,y) G [0,1] X [0,1] with Ax = Ay = 1/12 (see Fig. 4(a)). 
This network monitors a physical process f{x,y) described 
by the Poisson PDF: 


d'^f{x,y) d'^f{x,y) 

dx'^ dy^ 


h{x,y) 


( 121 ) 


where h{x,y) : [0, 1]^ —^ K is an unknown input function. 
The PDF satisfies the following boundary conditions: 


/(x, 0) = /(O, y) = /(x, 1) = /(I, y) = 0 

For this problem, the objective is to estimate h{x,y), given 
noisy measurements collected hy N = x Ny = 11 x 11 
nodes corresponding to the interior points of the network. To 
discretize the PDF, we employ the finite difference method 
(FDM) with uniform spacing of Ax and Ay. We define 
Xki = kiAx, yk 2 = k 2 Ay and introduce the sampled values 




(b) fki,k 2 over the space. 

Fig. 4: Spatial distribution of /(x, y) over the network grid 

{{xk,,yk2)}- 



X 


Fig. 5: Spatial distribution of SNR over the network. 


fkuk 2 = f{xki,yk 2 ) and = h{xk,,yk 2 )- We use the 

central difference scheme [39] to approximate the second order 
partial derivatives: 


d'^f(x,y,t) _ 1 

9x^ Ax^ 

d'^f{x,y,t) 1 


[fki-kl,k2 - 2/fci,fe2 + /fei-l.fca] (122) 

2 [/fel,fe2-|-l 2/fej_fc2 T fki,k2 — l ] (123) 


dy^ Ay' 

This leads to the following discretized input function: 

~ ifkl+l,k2 T /fel,/C2 + l + /fci — 1,^2 

+ fk^M-i -'^fk^,k2) ( 124 ) 


For this example, the unknown input process is 

hi = £"'"(('=^■‘‘("+('=="‘‘1') - -f 1 

(125) 

where k = (N^ — 1)^/4. 

To obtain fki,k 2 ^ we solve (121) using the Jacobi over¬ 
relaxation method [45]. Figure 4(b) illustrates the values of 
fkiM over the spatial domain. For the estimation of hki,k 2 ^ 
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(a) True parameters. (b) Estimated parameters. 

Fig. 6: True and estimated by diffusion LMS. 


the given information are the noisy measurement samples 
ZkiMi'i') = fkiM +’^fci.fc 2 ( 0 - In this relation, the noise pro¬ 
cess ^ 2 ( 1 ) is zero mean, temporally white and independent 
over space. For this network, the two dimensional reference 
signal is the distorted version of which is represented 

by The reference signal is obtained from (124) with 

fkiM replaced by their noisy measured samples i.e.. 


dk 




^k-i_,k2 — l{.'^') '^^ki ,k2{^^ 


According to (126), the linear regression model for this 
problem takes the following form; 

dki,k2{}) ~'^ki,k2{})^k-i,k2 “b'nfci,fc2(0 (127) 

where Mfci,fe 2 (*) = 1- Therefore, in this example, we are 
led to a linear model (127) with deterministic as opposed to 
random regression data. Although we only studied the case of 
random regression data in this article, this example is meant 
to illustrate that the diffusion strategy can still be applied to 
models involving deterministic data in a manner similar to [1], 

[54] . 

To represent as a space-invariant parameter vector, 

we use two-dimensional shifted Chebyshev basis functions 

[55] . Using this representation, can be expressed as: 

Nt 

KiM = (128) 

n—1 

where each element of the two-dimensional basis set is: 



X 


Fig. 7: Network steady-state MSD performance in dB. 

VI. Conclusion 

By combining interpolation and distributed adaptive opti¬ 
mization, we proposed a diffusion LMS strategy for estimation 
and tracking of space-time varying parameters over networks. 
The proposed algorithm can find the space-varying parameters 
not only at the node locations but also at spaces where no 
measurement is collected. We showed that if the network 
experiences data with rank-deficient covariance matrices, the 
non-cooperative LMS algorithm will converge to different 
solutions at different nodes. In contrast, the diffusion LMS 
algorithm is able to alleviate the rank-deficiency problem 
through its use of combination matrices especially since, as 
shown by (72), p{B) < p{I — A47Z), where I — M.TZ is 
the coefficient matrix that governs the dynamics of the non- 
cooperative solution. Nevertheless, if these mechanisms fail to 
mitigate the deleterious effect of the rank-deficient data, then 
the algorithm converges to a solution space where the error 
is bounded. We analyzed the performance of the algorithm in 
transient and steady-state regimes, and gave conditions under 
which the algorithm is stable in the mean and mean-square 
sense. 


Appendix A 

Mean Error Convergence 


Pn^ki,k2 — ^ni,ki^n2,k2 (129) 

where {bmM} and {bn 2 ,k 2 } '^be one-dimensional 

shifted Chebyshev polynomials in the x and y directions, 
respectively-recall (21). 

In the network, each interior node communicates with its 
four immediate neighbors. We use Ai = I and compute C 
and A 2 by using the Metropolis and relative degree rules [2], 
[6], [35]. All nodes are initialized at zero and pk = 0.01 for all 
k. The signal-to-noise ratio (SNR) of the network is uniformly 
distributed in the range [20, 30]dB and is shown in Fig. 5. 

Figures 6(a) and 6(b) show three dimensional views 
of the true and estimated input process using the pro¬ 
posed diffusion LMS algorithm after 3000 iterations. Fig¬ 
ure 7 illustrates the MSD of the estimated source, i.e.. 


Based on the rank of 7^ = diag{i?i, • • • , Rn}, we have two 
possible cases: 

a) Rk > 0 Vfc G {I,-- - As (68) implies, E['thi] 
converges to zero if p{B) < 1. In [6], it was shown that when 
72. > 0, choosing the step-sizes according to (73) guarantees 
p{B) < 1. 

b) 3k G {I,-- - ,7V} for which Rk is rank-deficient: For 
this case, we first show that 

(130) 

where || • ||{, 00 denotes the block-maximum norm for block 
vectors with block entries of size MN^xl and block matrices 
with blocks of size MN^x MNt- To this end, we note that for 
the left-stochastic matrices Ai and A 2 , we have ||Af ||b,oo = 
||-T|^||b,oo = 1 [6], and use the sub-multiplicative property of 
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the block maximum norm [46] to write: 


11^ ^ < 11-^2 IkiOO 11-^ ■M.T^\\b,oc 11^1 llb.oo X • • • 

X M^II&,oo \\I-Mn\\b,^ Mfilfc.oo 
= \\I - MTZ\\l~^^ (131) 

If we introduce the (block) eigendecomposition of 7?. (71) into 
(131) and consider the fact that the block-maximum norm is 
invariant under block-diagonal unitary matrix transformations 
[6], [47], then inequality (131) takes the form: 

(132) 

II II 6,00 — II II 6,00 


Using the property ||2f||;,oo = p{X) for ^ block diagonal 
Hermitian matrix X [6], we obtain: 


{i-MAy+\^=p(^{i-MAy+^) 

|(l - AifcAfe(n))*''’^ 


= max 
l<fe<iV 
l<n<MNb 


= ( max |1 -/rfeAfc(n)|) 
V i<fe<Af 7 


i-l-l 


l<n<MNb 

= (p{I-MA)) 
= \\I-MA 


\ i-l-l 

>) 

^+1 

6,00 


(133) 


Using (133) in (132), we arrive at (130). We now proceed 
to show the boundedness of the mean error for case (b). We 
iterate (68) to get: 


E[wi] = B^+^E[w_i] (134) 


Applying the block maximum norm to (134) and using in¬ 
equality (130), we obtain: 


lim Elthd , 


< lim {I-MAy+^ 

i—¥oo 


E[th_i 


(135) 


The value of limi_>oo ||(7 —Af A)*+^||f,_oo can be computed by 
evaluating the limits of its diagonal entries. Considering the 
step-sizes as in (73), the diagonal entries are computed as: 


lim (l - AifcAfe(n)) 


1+1 


Therefore, (135) reads as: 


1, ifAfe(n) = 0 
0, otherwise 


(136) 


^lim ||E[m,]||^_^ < ||/-Ind(A)||h,oo ||E[m_i](137) 


We define ^ = Q^Wk^i and start from some initial condition 
to arrive at 

lE[Pfc,i] = [I - MfcAfc]]E[Pfc,i-i] = [I - PkAkY^^E[pi, _y 

If we choose the step-sizes according to (73) then we get: 

lim E[pfe J =[I- Ind(Afc)]E[pfc i] (140) 
2—)-00 

Equivalently, this can be written as: 

lim E[tt>fc i] = Qfc [/ - Ind(Afc)] QI E[t(;fe _i] (141) 

i—¥oo 

This result indicates that the mean error does not grow 
unbounded. Now from (75), we can verify that: 

gfcInd(Afe)g^u;° = Rlrk (142) 

Then, upon substitution of Wk^i = w° — Wk,i into (141), we 
obtain: 

lim E[tUfc_j] = QfcInd(Afc)QfcU;° -I- Qk[I - Ind(Afc)]QrE[iyfc.-i] 

i—¥oo 

MNb 

= Rk'^k+ -l] (143) 

n=Lk+l 

Appendix C 
Proof of Lemma 2 
From (87), we readily deduce that 

lim ,B*'''^E[t(;_i] = (.Z2'22) E[t(;_i] (144) 

i—¥OCi 

On the other hand, from (86), we have 

i i 

lim 'yB^A^Mr = lim V iZi.PZi -f Z2Z2)AlMr 
j=0 j=0 

(145) 

Using (88), the term involving Z 2 cancels out and the above 
reduces to 

i i 

lim yB^AlMr = lim V (ZiPzAaIMt 

i—^oo ‘ ^ i—¥oo ‘ ^ 

j^O j=0 

= Zi{I - jyZiA'^Mr (146) 

since p(J) < 1. We now verify that the matrix 

X-= Zi{I - J)-^Zi (147) 

is a (reflexive) generalized inverse for the matrix X = {I — B). 
Recall that a (reflexive) generalized inverse for a matrix Y is 
any matrix Y~ that satisfies the two conditions [56]: 


Appendix B 

Mean Behavior When {Ai = A 2 = I) 

Setting Ai = A 2 = / in the diffusion recursions (39)-(41) 
and subtracting w° from both sides of (40), we get: 

ieXk 

(138) 

Under Assumption 1 and using di{i) = ui^iBiW° + Vi{i), we 
obtain: 

E[wk,i] = Qk[I - PkAk]Ql E[t(jfe,j_i] (139) 


YY-Y = Y (148) 

Y-YY- = Y- (149) 

To verify these conditions, we first note from ZZ~^ = J and 
Z~^Z = / in (86) that the following relations hold: 

ZiZi -f Z 2 Z 2 = 1 (150) 

.ZiZ2 = 0 (151) 

Z2-Zi = 0 (152) 

ZiZi = / (153) 

Z 2 Z 2 = I (154) 
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We further note that X can be expressed as: 

X = {I - B) = Zi{I - J)Zi (155) 

It is then easy to verify that the matrices {X,X~} satisfy 
conditions (148) and (149), as claimed. Therefore, (146) can 
be expressed as: 

i 

\im Mr = {IMr (156) 

i—1-00 ^^ 

3=0 

Substituting (144) and (156) into (78) leads to (89). 

Let us now verify that the right-hand side of (89) remains 
invariant under basis transformations for the Jordan factors 
{Zi, Zi, Z 2 , Z 2 }. To begin with, the Jordan decomposition 
(86) is not unique. Let us assume, however, that we fix the 
central term diagjj,/} to remain invariant and allow the 
Jordan factors {Zi, Zi, Z 2 , Z 2 } to vary. It follows from (86) 
that 

Z2B = Z2, BZ2 = Z2 (157) 

so that the columns of Z 2 and the rows of Z 2 correspond 
to right and left-eigenvectors of B, respectively, associated 
with the eigenvalues with value one. If we replace Z 2 by any 
transformation of the form Z 2 X 2 , where X 2 is invertible, then 
by (154), Z 2 should be replaced by T’ 2 ~^,Z 2 . This conclusion 
can also be seen as follows. The new factor Z is given by 


.Z 4 ] = [ Zi Z 2 


I O' 
0 A-2 


(158) 


and, hence, the new Z ^ becomes 




.Zi 

T:’2-^.Z2 


(159) 


which confirms that Z 2 is replaced by X^'^ 2,2- It follows that 
the product Z 2 Z 2 remains invariant under arbitrary invertible 
transformations T 2 . Moreover, from (86) we also have that 


Z 16 = JZi, SZi = Zi J (160) 


Assume we replace Z\ by any transformation of the form 
Z\X\, where Xi is invertible, then by (153), Z\ should be 
replaced by X^^Z\. However, since we want to maintain J 
invariant, then this implies that the transformation Xi must 
also satisfy 

XyjXi = J (161) 

It follows that the product Zi{I — J)~^Zi remains invariant 
under such invertible transformations Xi, since 

Zi{i-j)-^Zi = ZiXixyy - jy^Xixyzi 
= ZiXi{i - xyjxi)-^xyzi 
= ZiXiil - jy^xyzi (162) 


Appendix D 
PROOF OF LEMMA 3 

We first establish that Z 2 y and yZ^ are both equal to zero. 
Indeed, we start by replacing r in (88) by its expression from 


(61) and (63) as r = C^col{rd„ 1 , • • • ,rdu,N} that leads to: 

Z2AlMC'^col{fdu,i, ■ ■ ■ ,fdu,N} = 0 (163) 

By further replacing fdu,k by their values from (32), we obtain: 

.Z2.4^AJC^diag{Bf, • • • , B^}col{rd„_i, • • • ,rdu,N} = 0 

(164) 

This relation must hold regardless of the cross-correlation 
vectors {rdu,k}- Therefore, 

Z2A^MC^dmg{B'[, ■■■ , bJ;} = 0 (165) 

We now define 

V = diagjcr^ (166) 

and rewrite expression (100) as 

y = A^MC'^dmgiBf,--- ,B^}diag{i?„4, • • • ,Ru,n} 

X diag{Bi, • • • , Bn}VCMA 2 (167) 

Multiplying this from the left by Z 2 and comparing the result 
with (165), we conclude that 

Z 2 y = 0 ( 168 ) 

Noting that y is symmetric, we then obtain: 

yz^ = 0 (169) 

Returning to recursion (102), we note first from (86) that B 
can be rewritten as 

B = ZiJZi+Z2Z2 (170) 

Since B is power convergent, the first term on the right hand 
side of (102) converges to 

^hm =Mw-i\\Iz^z2V12Z2Z2 

Substituting (170) into the second term on the right hand side 
of (102) and using (168) and (169), we arrive at 

I I 

lim y =Trf lim y^iZ^PZif 

I—¥00 ' ^ \ J \ i^oo ' ^ 

j^O j^O 

X yZi.PZi)y^ (172) 

If matrices Xi, X 2 and S are of compatible dimensions, then 
the following relations hold [6]: 

Tr(XiX 2 ) = (vec(Xj))^vec(Xi) (173) 

vec(A:iSA: 2 ) = (Xj (g) Xi)vec(S) (174) 

Using these relations in (172), we obtain 

't ^-i 

Tr(^nrn = (vec)^)) 

j=0 

i 

X ( lim y (Zi (g {ZiPZif^eciY.) (175) 

j=0 
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This is equivalent to: 

CX) OO 

= (vec(3^)) J'^)vec(S) (176) 
j=o j=o 

where ^ 

((Zi®Zi)(J® J)(Zi®Zi)) (177) 

Since p{J (gi J) < 1, the series converges and we obtain: 

't ^-i 

Tr( hm = (vec(3^)) (I - F)-\ec{T.) 

(178) 

Upon substitution of (171) and (178) into (102), we arrive at 
(104). 
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